[redis学习笔记]redis 5.0新特性--动态hz及相关优化

redis5.0增加了一个dynamic-hz参数,用于自适应调整server.hz的值,平衡空闲CPU的使用率和响应能力;另外在redis5.0中对info命令、clientsCron函数、大量的短连接情形进行了优化。

动态hz参数

无法解释的延迟

在redis5.0之前,server.hz参数的值默认为10,即serverCron函数的执行周期为100ms,在serverCron函数中会调用clientsCron函数,处理一些连接客户端相关的事情,比如处理客户端超时、客户端查询缓冲区等。clientsCron函数如下所示:

#define CLIENTS_CRON_MIN_ITERATIONS 5
void clientsCron(void) {
    /* Try to process at least numclients/server.hz of clients
     * per call. Since normally (if there are no big latency events) this
     * function is called server.hz times per second, in the average case we
     * process all the clients in 1 second. */
    int numclients = listLength(server.clients);
    int iterations = numclients/server.hz;
    mstime_t now = mstime();

    /* Process at least a few clients while we are at it, even if we need
     * to process less than CLIENTS_CRON_MIN_ITERATIONS to meet our contract
     * of processing each client once per second. */
    if (iterations < CLIENTS_CRON_MIN_ITERATIONS)
        iterations = (numclients < CLIENTS_CRON_MIN_ITERATIONS) ?
                     numclients : CLIENTS_CRON_MIN_ITERATIONS;

    while(listLength(server.clients) && iterations--) {
        client *c;
        listNode *head;

        /* Rotate the list, take the current head, process.
         * This way if the client must be removed from the list it's the
         * first element and we don't incur into O(N) computation. */
        listRotate(server.clients);
        head = listFirst(server.clients);
        c = listNodeValue(head);
        /* The following functions do different service checks on the client.
         * The protocol is that they return non-zero if the client was
         * terminated. */
        if (clientsCronHandleTimeout(c,now)) continue;
        if (clientsCronResizeQueryBuffer(c)) continue;
        if (clientsCronTrackExpansiveClients(c)) continue;
    }
}

在上面的代码中,我们可以看到,为了在每秒钟内处理所有客户端连接一次,每次调用必须处理numclients/server.hz个客户端,因为clientsCron函数每秒钟调用server.hz次。当客户端数量非常多的时候,该部分的耗时将会非常多,比如1W个客户端连接,在默认值10的情况下,每次需要处理1000个客户端。

而且由于此处属于redis内部逻辑,并且没有延迟监控,当此处逻辑耗时比较大时,我们无法在慢查询日志或者延迟监控中发现,会使得生产上产生一些无法解释的延时现象。

动态hz实现

于是在redis5.0中增加了dynamic-hz参数,默认开启动态hz,使得在客户端连接非常多时,自适应调整hz参数,临时增加hz参数,使得每秒钟执行serverCron更多次,占用更多的CPU,每次可以处理一定数量的客户端连接,不至于产生严重超时现象。

在serverCron函数中,每次检查客户端数量,设置相应的hz值。

#define CONFIG_MAX_HZ            500
#define MAX_CLIENTS_PER_CLOCK_TICK 200          /* HZ is adapted based on that. */

server.hz = server.config_hz;   //默认值为10
/* Adapt the server.hz value to the number of configured clients. If we have
 * many clients, we want to call serverCron() with an higher frequency. */
if (server.dynamic_hz) {
    while (listLength(server.clients) / server.hz >
           MAX_CLIENTS_PER_CLOCK_TICK)
    {
        server.hz *= 2;
        if (server.hz > CONFIG_MAX_HZ) {
            server.hz = CONFIG_MAX_HZ;
            break;
        }
    }
}

可以看出,每次循环最多处理200个客户端连接,hz的值最高不超过500(事实上,自己设置的话不建议超过100),默认hz值10将作为一个基线,每次循环都将hz设为配置的hz值,然后,如果客户端数量非常多,就自适应调整hz的值;下一次循环中,如果客户端数量变少了,hz值依然会是默认值了。

clientsCron优化

4.0中实现

上面说了,在clientsCron中一次处理过多客户端连接,可能会引起延时,在redis5.0以前版本中,clientsCron函数中调用的clientsCronResizeQueryBuffer函数存在一个bug,使得该问题得到加剧和放大。

redis5.0以前的clientsCronResizeQueryBuffer函数实现为:

int clientsCronResizeQueryBuffer(client *c) {
    size_t querybuf_size = sdsAllocSize(c->querybuf);
    time_t idletime = server.unixtime - c->lastinteraction;

    /* There are two conditions to resize the query buffer:
     * 1) Query buffer is > BIG_ARG and too big for latest peak.
     * 2) Client is inactive and the buffer is bigger than 1k. */
    if (((querybuf_size > PROTO_MBULK_BIG_ARG) &&
         (querybuf_size/(c->querybuf_peak+1)) > 2) ||
         (querybuf_size > 1024 && idletime > 2))
    {
        /* Only resize the query buffer if it is actually wasting space. */
        if (sdsavail(c->querybuf) > 1024) {
            c->querybuf = sdsRemoveFreeSpace(c->querybuf);
        }
    }
    /* Reset the peak again to capture the peak memory usage in the next
     * cycle. */
    c->querybuf_peak = 0;
    return 0;
}

满足以下两个条件之一时:

(1) 查询缓冲区大于32K,且远大于查询缓冲区数据峰值

(2) 查询缓冲区大于1K,且客户端当前处于非活跃状态

同时查询缓冲区空闲空间大于1K,就回收空闲空间。

而输入缓冲区在第一次分配时就为32K,所以显然大于1K,所以只要不够活跃很容易满足回收条件

输入缓冲区空间分配在readQueryFromClient函数中,

#define PROTO_IOBUF_LEN         (1024*16)  /* Generic I/O buffer size */

readlen = PROTO_IOBUF_LEN;

c->querybuf = sdsMakeRoomFor(c->querybuf, readlen);

在sdsMakeRoomFor函数中分配时,如果分配的空间小于1M,会乘以2,所以首次就分配了32K空间。

5.0中实现

在redis5.0中clientsCronResizeQueryBuffer函数实现如下:

int clientsCronResizeQueryBuffer(client *c) {
    size_t querybuf_size = sdsAllocSize(c->querybuf);
    time_t idletime = server.unixtime - c->lastinteraction;

    /* There are two conditions to resize the query buffer:
     * 1) Query buffer is > BIG_ARG and too big for latest peak.
     * 2) Query buffer is > BIG_ARG and client is idle. */
    if (querybuf_size > PROTO_MBULK_BIG_ARG &&
         ((querybuf_size/(c->querybuf_peak+1)) > 2 ||
          idletime > 2))
    {
        /* Only resize the query buffer if it is actually wasting
         * at least a few kbytes. */
        if (sdsavail(c->querybuf) > 1024*4) {
            c->querybuf = sdsRemoveFreeSpace(c->querybuf);
        }
    }
    /* Reset the peak again to capture the peak memory usage in the next
     * cycle. */
    c->querybuf_peak = 0;
    //...
}

更改为满足以下两个条件之一:

(1) 查询缓冲区大于32K,且远大于查询缓冲区数据峰值

(2) 查询缓冲区大于32K,且客户端当前处于非活跃状态

同时查询缓冲区空闲空间大于4K,就回收空闲空间。

如果客户端数量比较多,且刚好比较空闲,在5.0以前,很容易因为需要一次处理很多客户端的输入缓冲区,导致节点延迟甚至崩溃。

案例分享

下面链接两个相关案例:

info命令优化

info命令慢日志

按说info命令只是返回一些基本统计数据,不应该有慢日志,但是在redis5.0以前,当客户端连接非常大的时候,可能会出现info慢日志。

原因是,在redis5.0以前的info命令实现中,每次会获取所有客户端中使用的最大输入/输出缓冲区大小,因此需要循环遍历每个客户端连接,当客户端数据非常大,比如上万连接时,将操作将会非常耗时。

info命令实现依次调用genRedisInfoString –> getClientsMaxBuffers,代码就不列出了,可以自行查看源码,并且是在genRedisInfoString函数中一开始就调用,所以对所有info子命令都有影响。

info命令优化

在redis5.0中,对该问题进行了优化,不再每次获取所有客户端连接的最大输入/输出缓冲区大小,而是在clientsCron函数中用一个数组记录客户端连接最近几秒时间的输入/输出缓冲区最大值,在info clients子命令中调用getExpansiveClientsInfo函数,获取这个数组中保存的最近几秒时间内的最大值,这样不管客户端连接数量多达,获取时间都是恒定的,且耗时非常小。

在clientsCron函数中调用clientsCronTrackExpansiveClients追踪所有客户端连接中最近几秒的输入/输出缓冲区最大值

int clientsCronTrackExpansiveClients(client *c) {
    size_t in_usage = sdsAllocSize(c->querybuf);
    size_t out_usage = getClientOutputBufferMemoryUsage(c);
    int i = server.unixtime % CLIENTS_PEAK_MEM_USAGE_SLOTS;
    int zeroidx = (i+1) % CLIENTS_PEAK_MEM_USAGE_SLOTS;

    /* Always zero the next sample, so that when we switch to that second, we'll
     * only register samples that are greater in that second without considering
     * the history of such slot.
     *
     * Note: our index may jump to any random position if serverCron() is not
     * called for some reason with the normal frequency, for instance because
     * some slow command is called taking multiple seconds to execute. In that
     * case our array may end containing data which is potentially older
     * than CLIENTS_PEAK_MEM_USAGE_SLOTS seconds: however this is not a problem
     * since here we want just to track if "recently" there were very expansive
     * clients from the POV of memory usage. */
    ClientsPeakMemInput[zeroidx] = 0;
    ClientsPeakMemOutput[zeroidx] = 0;

    /* Track the biggest values observed so far in this slot. */
    if (in_usage > ClientsPeakMemInput[i]) ClientsPeakMemInput[i] = in_usage;
    if (out_usage > ClientsPeakMemOutput[i]) ClientsPeakMemOutput[i] = out_usage;

    return 0; /* This function never terminates the client. */
}

在info命令的实现中依次调用genRedisInfoString –> getExpansiveClientsInfo

/* Return the max samples in the memory usage of clients tracked by
 * the function clientsCronTrackExpansiveClients(). */
void getExpansiveClientsInfo(size_t *in_usage, size_t *out_usage) {
    size_t i = 0, o = 0;
    for (int j = 0; j < CLIENTS_PEAK_MEM_USAGE_SLOTS; j++) {
        if (ClientsPeakMemInput[j] > i) i = ClientsPeakMemInput[j];
        if (ClientsPeakMemOutput[j] > o) o = ClientsPeakMemOutput[j];
    }
    *in_usage = i;
    *out_usage = o;
}

参考资料

短连接优化

5.0以前实现

Redis在释放客户端连接的时候,会依次调用freeClient –> unlinkClient –> listSearchKey,可以看到在listSearchKey中,redis遍历双端列表server.clients查找到对应的redisClient对象然后调用listDelNode把该redisClient对象从server.clients删除,

if (c->fd != -1) {
    /* Remove from the list of active clients. */
    ln = listSearchKey(server.clients,c);
    serverAssert(ln != NULL);
    listDelNode(server.clients,ln);

    /* Unregister async I/O handlers and close the socket. */
    aeDeleteFileEvent(server.el,c->fd,AE_READABLE);
    aeDeleteFileEvent(server.el,c->fd,AE_WRITABLE);
    close(c->fd);
    c->fd = -1;
}

listSearchKey遍历列表为O(n)时间复杂度,当大量短连接操作redis时,频繁的释放客户端会引起redis的CPU使用率显著上升。

5.0优化

在redis5.0中为了解决此问题,对此操作进行了优化。在createClient的时候将redisClient的指针地址保留,在freeClient的时候直接删除对应的listNode即可,无需再次遍历server.clients。

在createClient函数中,最后会调用linkClient函数

if (fd != -1) linkClient(c);

//linkClient实现如下:
/* This function links the client to the global linked list of clients.
 * unlinkClient() does the opposite, among other things. */
void linkClient(client *c) {
    listAddNodeTail(server.clients,c);
    /* Note that we remember the linked list node where the client is stored,
     * this way removing the client in unlinkClient() will not require
     * a linear scan, but just a constant time operation. */
    c->client_list_node = listLast(server.clients);
    uint64_t id = htonu64(c->id);
    raxInsert(server.clients_index,(unsigned char*)&id,sizeof(id),c,NULL);
}

可以看到,在将新建的client插入server.clients列表时,将封装该client的节点指针保存进了c->client_list_node;同时以client的唯一索引作为索引,将client插入基数树中。

在释放客户端时,仍然是依次调用freeClient –> unlinkClient,在unlinkClient中此时只需要根据保存的节点指针直接去列表中把节点删除,无需遍历列表,时间复杂度直接降为O(1)。

if (c->fd != -1) {
    /* Remove from the list of active clients. */
    if (c->client_list_node) {
        uint64_t id = htonu64(c->id);
        raxRemove(server.clients_index,(unsigned char*)&id,sizeof(id),NULL);
        listDelNode(server.clients,c->client_list_node);
        c->client_list_node = NULL;
    }

    /* Unregister async I/O handlers and close the socket. */
    aeDeleteFileEvent(server.el,c->fd,AE_READABLE);
    aeDeleteFileEvent(server.el,c->fd,AE_WRITABLE);
    close(c->fd);
    c->fd = -1;
}

该函数中根据client唯一id,将客户端从基数树中删除,根据节点指针将客户端直接从server.clients列表中删除。

基数树用于client unblock命令操作。

参考资料